Partially observable Markov decision processes

نویسنده

  • Matthijs Spaan
چکیده

For reinforcement learning in environments in which an agent has access to a reliable state signal, methods based on the Markov decision process (MDP) have had many successes. In many problem domains, however, an agent suffers from limited sensing capabilities that preclude it from recovering a Markovian state signal from its perceptions. Extending the MDP framework, partially observable Markov decision processes (POMDPs) allow for principled decision making under conditions of uncertain sensing. In this chapter we present the POMDP model by focusing on the differences with fully observable MDPs, and we show how optimal policies for POMDPs can be represented. Next, we give a review of model-based techniques for policy computation, followed by an overview of the available modelfree methods for POMDPs. We conclude by highlighting recent trends in POMDP reinforcement learning.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A POMDP Framework to Find Optimal Inspection and Maintenance Policies via Availability and Profit Maximization for Manufacturing Systems

Maintenance can be the factor of either increasing or decreasing system's availability, so it is valuable work to evaluate a maintenance policy from cost and availability point of view, simultaneously and according to decision maker's priorities. This study proposes a Partially Observable Markov Decision Process (POMDP) framework for a partially observable and stochastically deteriorating syste...

متن کامل

Transition Entropy in Partially Observable Markov Decision Processes

This paper proposes a new heuristic algorithm suitable for real-time applications using partially observable Markov decision processes (POMDP). The algorithm is based in a reward shaping strategy which includes entropy information in the reward structure of a fully observable Markov decision process (MDP). This strategy, as illustrated by the presented results, exhibits near-optimal performance...

متن کامل

Increasing Scalability in Algorithms for Centralized and Decentralized Partially Observable Markov Decision Processes: Efficient Decision-Making and Coordination in Uncertain Environments

INCREASING SCALABILITY IN ALGORITHMS FOR CENTRALIZED AND DECENTRALIZED PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES: EFFICIENT DECISION-MAKING AND COORDINATION IN UNCERTAIN ENVIRONMENTS

متن کامل

Deciding the Value 1 Problem for ]-acyclic Partially Observable Markov Decision Processes

The value 1 problem is a natural decision problem in algorithmic game theory. For partially observable Markov decision processes with reachability objective, this problem is defined as follows: are there strategies that achieve the reachability objective with probability arbitrarily close to 1? This problem was shown undecidable recently. Our contribution is to introduce a class of partially ob...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007